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ABSTRACT 

In order to cope with the increased data volumes generated by mod- 
ern radio interferometers such as LOFAR (Low Frequency Array) 
or SKA (Square Kilometre Array), fast and efficient calibration al- 
gorithms are essential. Traditional radio interferometric calibration 
is performed using nonlinear optimization techniques such as the 
Levenberg-Marquardt algorithm in Euclidean space. In this paper, 
we reformulate radio interferometric calibration as a nonlinear opti- 
mization problem on a Riemannian manifold. The reformulated cali- 
bration problem is solved using the Riemannian trust-region method. 
We show that calibration on a Riemannian manifold has faster con- 
vergence with reduced computational cost compared to conventional 
calibration in Euclidean space. 

Index Terms — Calibration, Interferometry: Radio interferom- 

etry 

1. INTRODUCTION 

Radio interferometric calibration is the estimation of errors intro- 
duced by the propagation medium (such as the ionosphere) and by 
the receivers (such as the beam shape). In order to produce high fi- 
delity and high dynamic range images, calibration is essential. While 
contemporary radio interferometric arrays at most have a few tens 
of receivers (or stations), there is a trend towards building large ra- 
dio interferometers with hundreds of receivers, an example being the 
Square Kilometre Array (SKA). This naturally leads to data volumes 
that are by far greater than what is produced by contemporary radio 
telescopes. 

The maximum likelihood estimation of calibration parameters is 
in fact a nonlinear optimization problem. Currently, nonlinear opti- 
mization algorithms such as the Levenberg-Marquardt (LM) method 
Q]|2) are used in radio interferometric calibration j3]. The cost func- 
tion that is minimized during calibration is invariant to multiplication 
of the parameters by a 2 by 2 unitary matrix. Therefore, the solutions 
acquired by calibration will have a unitary matrix ambiguity |4). 

In this paper, we present the 'quotient manifold' geometry [5 1 of 
the calibration parameters, which is a better representation of their 
invariance to multiplication by 2 by 2 unitary matrices. We further 
develop the geometric structure of calibration parameters, first pre- 
sented in |6|. Rather than minimizing the cost function in Euclidean 
space, as is currently done, we minimize the cost function on the 
developed quotient manifold. We use the Riemannian Trust-Region 
(RTR) method (7) for minimizing the cost function. 

Optimization on matrix manifolds has developed significantly 
during the past decade and a complete overview can be found in 
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0. In particular, when there is an underlying symmetry in the pa- 
rameter space (such as the invariance to multiplication by a unitary 
matrix), exploiting the geometric structure yields better performing 
algorithms |f8] [9]|^0| . 

Moreover, algorithms such as the LM operate in real parame- 
ter space and the cost of calibration of an interferometric array with 
hundreds of elements is significant, mainly due to the increased size 
of the Jacobian II II . In this paper, we treat calibration parameters as 
complex numbers and because we employ the RTR method (7), the 
computational and memory costs are reduced. The novelty of the 
work presented in this paper (relation to prior work) is as follows: 
(i) We present the quotient manifold geometry of radio interferomet- 
ric calibration, improving on @. (ii) We reformulate radio inter- 
ferometric calibration as an optimization problem on a Riemannian 
manifold, where we derive expressions for the Riemannian gradient 
and the Hessian, following (8j. (iii) We apply the RTR method (7) 
for calibration instead of the traditional Euclidean space calibration 
algorithms. 

The rest of the paper is organized as follows: In section [2] we 
give an overview of radio interferometric calibration. Next, in sec- 
tion^ we present the geometric structure of calibration parameters. 
We present the Riemannian gradient and Hessian operators in sec- 
tion [4] for the calibration cost function. Simulation results are pre- 
sented in section [5] where we apply the RTR method for calibration 
and finally, we draw our conclusions in section [6] 

Notation: Matrices and vectors are denoted by bold upper and 
lower case letters as J and v, respectively. The transpose and the 
Hermitian transpose are given by (.) T and respectively. The 
matrix Frobenius norm is given by ||.||. The set of real and complex 
numbers are denoted by R and C, respectively. The identity matrix 
is given by I. The matrix trace operator is given by trace(.). 

2. RADIO INTERFEROMETRIC CALIBRATION 

In this section, we present radio interferometric calibration as an op- 
timization problem. Consider a radio interferometric array with N 
receivers. The observed data at a baseline formed by two receivers, 
p and q is given by 1121 

V ' pq — JpCpq 

J? +N P9 (1) 

where V p(3 (G C 2x2 ) is the observed visibility matrix. The er- 
rors that need to be calibrated for station p and q are given by the 
Jones matrices 3 P , J q (£ C 2x2 ), respectively. The sky signal (or co- 
herency) is given by C pq (6 C 2x2 ). The noise matrix N pq (G C 2x2 ) 
is assumed to have complex, zero mean, circular Gaussian elements. 

For an array with N receivers, we can form at most N(N —l)/2 
baselines that collect visibilities as in Q}. We rewrite ([TJ as 

V pq — ApJCpq J Ag -(- N^pg (2) 



where J (€ C 2JVx2 ) is the augmented matrix of Jones matrices of 
all stations, 

J = [3l,3l,...,3 T N ] T (3) 

and A p (G R 2x2iv ) (and A q likewise) is the canonical selection 
matrix 

A p = [0,0,...,I,...,0]. (4) 
In (|4j, all elements of A p are zero except the p-th block which is an 
identity matrix. 

Calibration is the estimation of J given the visibilities as in (QJ. 

Under a Gaussian noise model, the Maximum Likelihood estimate 

is ^ 

J = argmin/(J) (5) 
j 

where the nonlinear cost function /(J) is 

/(J) = ^||V p? -A p JC pg J H A^|| 2 . (6) 

The sky signal almost always has very little polarization and 
therefore, the coherencies C P9 in QJ are diagonal matrices. There- 
fore, for any unitary U (G C 2x2 ), we see that /(J) = /(JU). In 
other words, for any solution J, a feasible solution for $5$ would 
also be JU where U is unitary. Currently, a solution for (|5} is ob- 
tained by well known nonlinear optimization methods such as the 
Levenberg-Marquardt [1. 2 | method and an in-depth overview of 
current calibration approaches can be found in e.g., J5]. 
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Fig. 1. The quotient manifold geometry of the calibration parame- 
ters. The dashed (blue) line (on Ai) represents the equivalence class 
of all solutions that are related to J by a unitary ambiguity. This 
equivalence class is represented by a single point on the quotient 
manifold Ai — Ai/ ~. The vertical space Vj Ai is the vector space 
tangential to the equivalence class and the horizontal space HjAi is 
the orthogonal complement. 



3. GEOMETRIC STRUCTURE OF CALIBRATION 

In this section, we present the manifold geometric structure of the 
parameters J used in radio interferometric calibration. A manifold 
can be described as a set of entities, together with a set of mappings 
(or charts) that can locally describe the manifold in Euclidean space. 
For a more formal introduction to matrix manifolds, the reader is re- 
ferred to |3J. A problem very similar to what we consider in this sec- 
tion (involving real symmetric positive semi-definite matrices) can 
be found in jl] and we follow the same approach. 

Given the solution to {3), i.e. J, we know that JU is also a 
feasible solution. We say J and JU are similar, i.e., 

J ~ JU (7) 

when U is any unitary matrix. Therefore, the whole set of feasible 
solutions JU where U is any unitary matrix can be represented by 
one of its elements, J. We consider Ai to be the manifold of all 
2N x 2 complex matrices (C 2iVx2 ). While the whole set of feasible 
solutions lie on Ai, using the quotient manifold Ai — Ai/ ~ we 
can represent the whole set by a single point as shown in Fig. [T] 

The mapping n (canonical projection) is defined such that any 
matrix JU on Ai is mapped onto a single point, 7r(J) on Ai. With 
this mapping, we define the equivalence class 

7r _1 (7r( J )) = {JU : UU H = U ff U = I, U G C 2x2 } (8) 

of solutions represented by a single point on AI. In order to make 
Ai a Riemannian manifold, we introduce the (smooth) inner product 
<7j(., .) to its tangent space 7jA4 as 

Pj(£j,?7j) = trace(£f r/j +r)j£j), £j,r]j G TjM. (9) 

With ©, we can decompose 7j Ai into two complementary vec- 
tor spaces as 

TjM = VjAi © HjM (10) 



where © is the direct sum operator. We define the vertical space to 
be the directions tangential to the equivalence class at J, i.e., 

VjM = {m : fl H = -ft, ft G C 2x2 } (11) 

and we choose the set of directions orthogonal to the equivalence 
class at J as the horizontal space H.jAi, 

HjM = {£j G C 2iVx2 : G H J = (12) 

The proof of J 1 2t is easy to obtain: Let rjj — Jfl G VjAi then by 
making gj(£j,ru) = 0, we get (O. 

The projection of any direction Z G C 2JVx2 onto the horizontal 
space at J is given by 

YlnjM(Z) = Z-m (13) 

where f2 (G C 2x2 ) is skew-Hermitian and (because Z — Jf2 G 
HjAi) satisfies the Sylvester equation 

fiJ H J + J ff Jfi = 3 H Z-Z H 3. (14) 

A retraction is a mapping from TjAi to Ai. There are many 
possible retractions but we choose a simple formula for the retraction 

as 

i?j(£i) = J + £j- d5) 

4. CALIBRATION USING A RIEMANNIAN MANIFOLD 

Rather than solving l[5} in Euclidean space, we minimize the cost 
function /(J) on Ai. In order to do this, we need to compute the 
Riemannian gradient and the Riemannian Hessian of /(J). The Rie- 
mannian gradient grad(/(J)) is the unique operator that satisfies 

ffj(£j,grad(/(J))) = D/(J)Kj], V£j G TjM (16) 



where, 



(a) 



<b) 



/(J) 



Using l(6j and (|9), we get 

grad(/(J)) 



(17) 



(18) 



-J2 (Aj(V p , - A P JC P ,J H A^)A,JCJ 
+ A^(V P , - A p JC p qJ H A^) ff A P JC P9 ) 



and the horizontal lift of grad(/(J)) to MjM is 



grad(/(J)) = U nj M (grad(/(J))) . 
The Riemannian Hessian is defined as 



(19) 



Hess/(J)[?7j] = TlnjM ( limi (grad/(J + tr]j) - grad/(J)) 



where 



lim- (grad/(J + tr,j) - grad/(J)) = 

(Aj ((V P9 - A p JC p ,J ff A^A^j 



(20) 
(21) 



A P (JC P9 77? + r,jC p ,j")A^A 9 Jj c£ 
A^ ((V p , - A P JC P9 J H A^) H A P 77J 
A q {JC pqV ? + 77jC p(J J ff ) H AjA p j) C 



Note that for notational purposes we write products such as A p J in 
the above expressions but we do not actually form a matrix product 
because A p -s are merely selection matrices. 

With the Riemannian gradient and Hessian at hand, we apply 
the Riemannian trust-region method J7) to our problem. The trust- 
region method solves the problem 



m ^ n >J( J } + ffj(s rad (/( J ))> 7 ? J ) + ofj ( Hess f( J )lvj]>vj) 

subject to gj(rjj,rjj) < 8 2 , where 8 is the trust-region radius. 

The computational cost of the RTR method is significantly less 
compared with the LM method mainly due to the following reason. 
In the LM method, with N stations, the Jacobian is a matrix of size 
8 x N(N - 1) /2 by 8A with real entries. The multiplication of the 
transpose of the Jacobian with itself has cost O ((8A) 2 4A (N - 1)) 
and the linear system solved is of size 8A. On the other hand, in the 
RTR method, both the gradient and the Hessian are of size 2A x 2 
with complex entries. Moreover, no full linear system is solved 
(since the truncated conjugate gradient method is used Q 1131 ), ex- 
cept in solving J 14b . which is only a linear system of order 4. 

5. SIMULATION RESULTS 

In this section we compare the performance of the proposed cali- 
bration approach against conventional calibration. For conventional 
calibration, we consider two optimization algorithms: LM algorithm 
and Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm 0131 . For 
the LM algorithm, we use closed form Jacobian calculation and for 
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Fig. 2. Reduction of the cost function /(J) with the time taken for 
the three optimization algorithms, (a) the RTR method (b) the LM 
method (c) the BFGS method. The number of stations N — 30 and 
the SNR = 100. It is clear that the RTR method uses much less 
time to reach the minimum cost while both other algorithms take 
significantly longer times. 



BFGS we use closed form gradient calculation (i.e. not using finite 
differences). We used the MATLAB implementation of the RTR 
method 1 14) in our simulations. 

We simulate an array of N receivers where N is varied. The 
error matrices J p , J q in |QJ are generated with their elements hav- 
ing values drawn from a complex uniform distribution in [0, 1] as 
U(0, 1) + jU(0, 1). The sky signal is kept at unity, i.e. C P9 = I. 
The noise matrix N p<3 is simulated to have complex circular Gaus- 
sian random variables. The variance of the noise is changed accord- 
ing to the signal to ratio (SNR) 



SNR 



y 



(22) 



The initial values for the parameters are set as J p = I for p G 
[1, N]. For the RTR method, the upper bound for the trust region 
radius S is chosen as 



A 



(23) 



and the initial trust region radius is chosen as 8/ 10. 

In Fig. [2] we show the reduction of the cost /(J) for N — 30 
and SNR = 100 for the three algorithms. The computing time was 
measured using a single Intel Xeon CPU core. It is evident that 
the RTR method takes significantly less time to reach the minimum 
cost. Furthermore, Fig. |2]shows that both three algorithms reach the 
minimum cost (i.e. they converge). 

In the next simulation, we vary both N and the SNR. For each 
value of N, the SNR is changed to 50, 100, 150, and 200 and the 
computation time taken by each algorithm to reach convergence is 
measured. Once again, we use a single CPU core for the compu- 
tations. The results are given in Fig. [3] In Fig. [3] we present the 
average computing time taken for all values of SNR. The superior- 
ity of the RTR method is once again highlighted in this figure. 




100 200 300 100 200 300 100 200 300 
No. of Stations (N) 



Fig. 3. Average computation time for the three algorithms for var- 
ious values of N. (a) the RTR method (b) the LM method (c) the 
BFGS method. The noise is varied with SNR = 50, 100, 150, and 
200 for each value of N. The RTR method takes significantly less 
time than the other two algorithms. 



The average residual error for all values of SNR (or the value 
of /(J) at convergence) is shown in Fig. [4] It is clear that all three 
methods reach the same final cost at convergence. 

6. CONCLUSIONS 

We have presented the geometric structure in the form of a Rieman- 
nian quotient manifold that can be used in radio interferometric cal- 
ibration. We have derived the Riemannian gradient and Hessian op- 
erators to minimize the cost function used in calibration. By em- 
ploying the Riemannian trust-region method, we have proposed a 
computationally efficient calibration method. Based on simulation 
results, we have shown that the proposed calibration algorithm is 
much faster and also efficient in memory usage, compared with ex- 
isting calibration algorithms that operate in Euclidean space. 
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